Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add paging to RSS results #366

Open
wants to merge 24 commits into
base: 7.x
Choose a base branch
from
Open

Conversation

bondjimbond
Copy link

JIRA Ticket: (https://jira.duraspace.org/browse/ISLANDORA-2439)

What does this Pull Request do?

Adds paging to Solr RSS results.

What's new?

Adds additional <link> elements in the <channel> element that provide the next, previous, and first pages.

How should this be tested?

  • Enable the RSS secondary display profile in your Solr settings
  • Run a search with a lot of results (e.g. an empty search) and click the RSS button
  • Note the new links in the <channel> section
  • Note that your search is limited to the number of results in your RSS settings
  • Put the content of the "next" link into your browser
  • Result should contain the next X results of your search

Interested parties

@Islandora/7-x-1-x-committers

@bondjimbond bondjimbond requested a review from DiegoPino June 10, 2019 15:55
@bondjimbond
Copy link
Author

Looks like the links produced here aren't working because at some point in the process (after format_rss_channel passes $args to format_xml_elements) the ampersands get encoded. The resulting URL is no good.

Any suggestions for overcoming the encoded ampersand problem?

@jonathangreen
Copy link

You should probably use drupal_get_query_parameters to get the parameters instead of the manual parsing you are doing here. You can see how its used in query_processor.inc. I would recommend using the Drupal url function to make the links.

@bondjimbond
Copy link
Author

@jonathangreen I'm not sure that applies here... This is what I get from drupal_get_query_parameters:

Array
(
    [type] => dismax
    [islandora_solr_search_navigation] => 0
    [solr_profile] => rss
)

I've fixed the link problem -- of course I realized these should have been value instead of attributes, so we're good to go.

@bondjimbond
Copy link
Author

Crap, never mind - it's still escaping. I'm sort of at a loss here..

@bondjimbond
Copy link
Author

@jonathangreen OK, all is well. Apparently you can add encoded => TRUE to the list of array attributes. Ready to test if you don't mind.

Copy link

@jonathangreen jonathangreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the URL building and handling of page arguments to be done in a more drupaly way here instead of using the PHP variables and string concatenation.

islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
@bondjimbond
Copy link
Author

@jonathangreen OK, I finally understood what you were talking about... got the 'page' parameter from drupal_get_query_parameters and used that to parse the page numbers.

@bondjimbond
Copy link
Author

@jonathangreen Still not sure I can do anything about the $feed_url though -- I haven't found a Drupal function that does the job.

@bondjimbond
Copy link
Author

@jonathangreen I think everything works now! Any further items for the review?

@bondjimbond
Copy link
Author

@jonathangreen Added two more needed pieces: no "next" link if there is no next page, and remove the "page" attribute for the "previous" link if the previous page is the first page.

Copy link

@jonathangreen jonathangreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is coming along. I'd like to see none of the links use $base_url and all use the url function.

I'm also wondering if there should be code here thats setting the variable for paging that maybe wasn't checked in.

islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
islandora_solr_config/includes/rss_results.inc Outdated Show resolved Hide resolved
@bondjimbond
Copy link
Author

All right! I had to learn a lot to make this work, but it's sorted out... All of your comments are addressed, @jonathangreen, I think.

@bondjimbond
Copy link
Author

Anything left, @jonathangreen?

@jonathangreen
Copy link

Off the top of my head I see a few debugging statements left there. Sorry its been a really busy week for me, but I'm going to try to take another pass at this next week.

@bondjimbond
Copy link
Author

Off the top of my head I see a few debugging statements left there

Ack, you're right. I'm too impatient with my pushes! Those have been removed, thanks.

@bondjimbond
Copy link
Author

Pinging @jonathangreen Anything left or is this good to go?

@jonathangreen
Copy link

So I think the code looks okay, but I don't think it's a valid RSS document anymore, which is probably a bad thing.

RFC-5005 has some details about adding pagination to RSS:
https://tools.ietf.org/html/rfc5005#page-12

I think it's probably reasonable to follow their suggestions about using the atom namespace for pagination. I think we should also have an option to return all results, perhaps setting the RSS page limit to 0. In that case we should follow their suggestion of not displaying these extra links.

Another issue I noticed while testing this is that the & character breaks XML parsing, so we probably need to make sure that is escaped, so things properly validate. I would like to see the output from this validate with the W3C validator (or at least not throw any additional warnings).

@jonathangreen
Copy link

@DiegoPino do you have any comments on how to do the pagination for RSS? I think you had mentioned you did at the last committers call.

@DiegoPino
Copy link

@jonathangreen i referenced the same document you just did https://tools.ietf.org/html/rfc5005#section-3, there is nothing that says rss/rss2 does not allow it but yes, all examples are for the Atom namespaces.
Wordpress allows pagination for RSS and some APIs enable it, independently of if Atom or pure RSS
https://mibe.github.io/FeedWriter/classes/FeedWriter.Feed.html#method_setPagination

@jonathangreen
Copy link

I feel like the RSS examples from the RFC using the atom namespace within RSS make sense to me:

   <?xml version="1.0"?>
   <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:fh="http://purl.org/syndication/history/1.0">
    <channel>
     <title>Liftoff News</title>
     <link>http://liftoff.example.net/</link>
     <description>Liftoff to Space Exploration.</description>
     <lastBuildDate>Fri, 30 May 2003 11:06:42 GMT</lastBuildDate>
     <fh:archive/>
     <atom:link rel="current"
      href="http://liftoff.example.net/index.rss"/>
     <atom:link rel="prev-archive"
      href="http://liftoff.example.net/2003/04/index.rss"/>

     <item>
      <title>Upcoming Eclipse</title>
      <link>http://liftoff.example.net/2003/05/30/eclipse</link>
      <description>Sky watchers in Europe, Asia, and parts of
      Alaska and Canada will experience a partial eclipse of the Sun
      on Saturday, May 31st.</description>
      <pubDate>Fri, 30 May 2003 11:06:42 GMT</pubDate>
      <guid>http://liftoff.example.net/2003/05/30/eclipse</guid>
     </item>
     <item>
      <title>The Engine That Does More</title>
      <link>http://liftoff.example.net/2003/05/27/vasmir</link>
      <description>Before man travels to Mars, NASA hopes to
      design new engines that will let us fly through the Solar
      System more quickly.  The proposed VASIMR engine would do
      that.</description>
      <pubDate>Tue, 27 May 2003 08:37:32 GMT</pubDate>
      <guid>http://liftoff.example.net/2003/05/27/vasmir</guid>
     </item>
    </channel>
   </rss>

@bondjimbond
Copy link
Author

@jonathangreen If I move the link from 'value' into 'attributes->href', I end up encoding all of the parameters. Adding 'encoded' => TRUE only seems to work for the text in 'value' and not in 'href'.

That is...

    $result['args'][] = array(
        'key' => 'link',
        'attributes' => array(
          'rel' => 'next',
          'href' => $link_next,
        'encoded' => TRUE,
      );
    }

gets me <link rel="next" href="http://localhost:8000/islandora/search/test?page=1&amp;type=dismax&amp;islandora_solr_search_navigation=0&amp;solr_profile=rss 1" />

Any idea if it's possible to get rid of the &amp conversion?

@jonathangreen
Copy link

jonathangreen commented Jul 8, 2019

& needs to be encoded to &amp; or its not valid XML anymore.
http://xml.silmaril.ie/specials.html

@bondjimbond
Copy link
Author

Yes, so that's the problem... this is supposed to be a link, and &amp in the link won't resolve properly.

@jonathangreen
Copy link

I'm going to dismiss my review and step back from this one and let someone with RSS experience review it. I don't think we should be emitting invalid XML with & in it. I would have assumed RSS readers would do the sensible thing and unescape the URL before using it, but if they don't I don't know what the solution is here.

@jonathangreen jonathangreen dismissed their stale review July 8, 2019 17:35

@Islandora/7-x-1-x-committers with more RSS knowledge then I can review this one.

@DonRichards
Copy link
Member

@jonathangreen Looking at the Example RSS from the WC3 XML 2.0 Specs a link can have a & character but not completely sure if there are limits within the <link> tag. Just a thought. I'm not really familiar with it but just giving my 3 seconds of research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants